-
Notifications
You must be signed in to change notification settings - Fork 141
Serialization context #1102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialization context #1102
Conversation
56db57e to
3e53f35
Compare
8e0e193 to
4a0b661
Compare
4a0b661 to
6000b0d
Compare
temporalio/client.py
Outdated
| timeout=input.rpc_timeout, | ||
| ) | ||
|
|
||
| def _async_activity_data_converter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one seems odd. Is this when the client comes back to complete an activity out of band, they don't give us a lot of the information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this was not how it should be. Changed now (following C#/Java, as suggested in #1102 (comment)) so that the async activity handle itself implements WithSerialiationContext, allowing users to supply context fields matching those that will be used on receipt of the payload by the activity worker.
temporalio/worker/_workflow.py
Outdated
| workflow = _RunningWorkflow( | ||
| self._create_workflow_instance(act, init_job) | ||
| ) | ||
| workflow_instance, det = self._create_workflow_instance(act, init_job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems potentially problematic. The act given to create workflow instance is now given prior to decoding. Maybe not a problem, but is there a reason to change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't currently need to be decoded at this stage but I think you're right that this can be done less intrusively: we can get the workflow ID from init_job.workflow_id. I'll make that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does technically need to be decoded at this stage because info needs decoded memo and headers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section has been rewritten and the concerns here should be resolved now.
temporalio/client.py
Outdated
| WorkflowExecution._from_raw_info(v, self._client.data_converter) | ||
| WorkflowExecution._from_raw_info( | ||
| v, | ||
| self._client.data_converter._with_context( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to create a new context-specific converter for each page here? No strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, it feels better not to since the common case is all the same workflow. I've memoized it.
temporalio/client.py
Outdated
| ) -> None: | ||
| """Create workflow handle.""" | ||
| self._client = client | ||
| self._data_converter = client.data_converter._with_context( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is often created by people that don't care about it (e.g. they start a workflow and don't care about the handle). Are there concerns about creating a context-specific data converter in all cases even if it's never used? I wonder if we should build the converter each call when they make the call, same as top-level client calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made it a lazily-computed property.
I don't think we don't need to construct it on every call since users should not expect to be able to rely on dynamic behavior like that, and I don't think we should inline repetitive code. It does no I/O, so it's not obvious that we should optimize performance here, but I actually do think it's reasonable seeing as the constructor did not call any functions previously and as you say it may very well not be used.
temporalio/client.py
Outdated
| return self._client.data_converter._with_context( | ||
| ActivitySerializationContext( | ||
| namespace=self._client.namespace, | ||
| workflow_id=( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we did here in Java and .NET is had this async activity client/handle implement WithSerializationContext since we don't always have the workflow ID. This way, users who know some of this information can do a "with context" to get context-specific async activity client, and they can choose which fields they are ok being empty and such. The task token approach is by far the most common approach (though in general async activity completion is not that common), so I think we may need to just put this in front of users to let them set the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've done that, following .NET.
temporalio/converter.py
Outdated
| namespace: str | ||
| workflow_id: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In .NET and Java we had a common interface for both workflow and activity serialization context to show they both had these two fields. No problem not doing here, just noting it if you wanted to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I agree, since it's relatively important for users of this feature to understand that the workflow ID and namespace are available in both, a shared class makes sense. Done.
temporalio/converter.py
Outdated
| during serialization and deserialization. | ||
| """ | ||
|
|
||
| def with_context(self, context: Optional[SerializationContext]) -> Self: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in other SDKs, I am not sure there is ever expected to be a situation where this is called with None. With context always assumes it will be with a context and developers don't have to code around the absence of one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed Optional here. In a previous version of the PR I was passing None to Nexus contexts but not any longer.
temporalio/worker/_activity.py
Outdated
| data_converter = self._data_converter | ||
| if activity.info: | ||
| context = temporalio.converter.ActivitySerializationContext( | ||
| namespace=activity.info.workflow_namespace, | ||
| workflow_id=activity.info.workflow_id, | ||
| workflow_type=activity.info.workflow_type, | ||
| activity_type=activity.info.activity_type, | ||
| activity_task_queue=self._task_queue, | ||
| is_local=activity.info.is_local, | ||
| ) | ||
| data_converter = data_converter._with_context(context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to get this off the running activity instead of recreating every heartbeat? I know we store the payload converter on the activity context which is being accessed here, maybe we should store the data converter instead and just return its payload converter from activity.payload_converter()? I am unsure if this affects how multiprocessing and pickling work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, but it's not an immediate performance concern, and a non-trivial refactor, so I think we should leave it for a follow-on PR. Here's an untested branch: dan-9986-serialization-context...dan-9986-serialization-context-activity-context-dataconverter
temporalio/worker/_workflow.py
Outdated
| workflow = _RunningWorkflow( | ||
| self._create_workflow_instance(act, init_job) | ||
| ) | ||
| workflow_instance, det = self._create_workflow_instance(act, init_job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned with the refactoring that act (and its init_job) has not run through the codec by this point where it had before. I think the logic needs to stay using the codec before workflow instance creation code, but you need to extract the workflow ID from the init_job or the running workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we're doing that now.
temporalio/worker/_workflow.py
Outdated
| act: temporalio.bridge.proto.workflow_activation.WorkflowActivation, | ||
| init: temporalio.bridge.proto.workflow_activation.InitializeWorkflow, | ||
| ) -> WorkflowInstance: | ||
| ) -> tuple[WorkflowInstance, WorkflowInstanceDetails]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we need to change the entire return type here just to get workflow ID. Caller just extract it out of the init job and we don't have to mutate this code at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, the changes to this function have been reverted in connection to discussion above.
| self._payload_converter_class = det.payload_converter_class | ||
| self._failure_converter_class = det.failure_converter_class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we need to store these. The "with context" can be called on the already-created converters, we should not re-instantiate converters more than once per instance IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline and resolved in recent commits
| payload_converter = self._payload_converter_class() | ||
| failure_converter = self._failure_converter_class() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mentioned above, but I don't believe we need to reinstantiate converters multiple times in this instance, just call the "with context" on the already existing ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline and resolved in recent commits
adf8bc3 to
d3311e6
Compare
3f41215 to
ba1bfbb
Compare
This reverts commit e8e62bf6db3d5d054f31fa03141bb18c39a4bdb1.
8d7bba3 to
d8c4f17
Compare
cretz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing blocking. I do think we added plenty of cycles to user code that don't need it, but they are not enough to block the PR
| command.set_patch_marker.deprecated = deprecated | ||
| return use_patch | ||
|
|
||
| def workflow_payload_converter(self) -> temporalio.converter.PayloadConverter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be worth documenting in activity.payload_converter() and workflow.payload_converter() doc strings that these are context-specific payload converters, but not that important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
| if isinstance(payload_converter, temporalio.converter.WithSerializationContext): | ||
| payload_converter = payload_converter.with_context(context) | ||
| if isinstance(failure_converter, temporalio.converter.WithSerializationContext): | ||
| failure_converter = failure_converter.with_context(context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems there are several cases where we are calling a user's with context on a failure converter even if we don't need it. It's not a big deal of course, just extra unneeded instantiations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've split the method to construct the payload and failure converter separately.
| self, | ||
| command_info: Optional[_command_aware_visitor.CommandInfo], | ||
| ) -> Optional[temporalio.converter.SerializationContext]: | ||
| workflow_context = temporalio.converter.WorkflowSerializationContext( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many times in this method, this context isn't needed, why instantiate it every time as if it's always needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason to instantiate it in one location is to avoid repeating code. In general, that's a more important concern than avoiding an extra instantiation of a frozen dataclass holding two already-computed strings.
temporalio/converter.py
Outdated
| is_local: bool | ||
|
|
||
|
|
||
| # TODO: duck typing or nominal typing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove todo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, done.
| new_instance = type(self)() # Must have a nullary constructor | ||
| new_instance._set_converters(*converters) | ||
| return new_instance | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Payload Converter Context Propagation Issue
In CompositePayloadConverter.with_context, creating a new instance with type(self)() can break subclasses that have custom constructors or additional state. Furthermore, the _any_converter_takes_context flag isn't updated after setting the new converters, which can lead to with_context incorrectly returning self or failing to propagate the serialization context to its component converters.
Fixes #796
Problem statement
Solution
data_converter._with_context(context).AsyncActivityHandleto allow custom contextTODO: test thiswith_contextmethods on any of the following components:with_contexton anEncodingPayloadConverter. These are used by the SDK’s default payload converter.with_contextmethod onAsyncActivityHandlethat they can call their own activity context.Details
Python Serialization Context
Background
Data converter
Recall that a
DataConverteris a dataclass with three fields:payload_converter_class: a nullary factory function returning aPayloadConverterfailure_converter_class: a nullary factory function returning aFailureConverterPayloadCodecFor the payload converter and failure converter, the name and type annotation expect that the user supplies a subclass of
PayloadConverter/FailureConverter, as opposed to a generic callable.Example usage
At various points in client and worker code, these 3 instances are used to transform outbound and inbound data. For example, here is client code converting outbound workflow input args to
Payload:It calls
DataConverter.encode(), which does both serialization and encoding:Here’s an example of worker code using a failure converter:
Interfaces
Users can replace any of the three fields on the default data converter.
Replacing the payload codec or failure converter is straightforward.
Payload codec
For payload codec the user must implement the following interface and set an
encodingproperty:Failure converter
Failureis the name of the proto struct we use for encoding failures across languages. For failure converter the user must implement:Payload converter
The
PayloadConverterinterface is:By default, the payload converter is a
CompositePayloadConverterwhich contains multipleEncodingPayloadConverters, which are tried in order until one succeeds. To replace the payload converter, a user normally creates a subclass ofCompositePayloadConverterthat prepends a customEncodingPayloadConvertersonto the existing collection.An
EncodingPayloadConverteris an interface with a stringencodingfield andProblem statement
Solution
data_converter._with_context(context).AsyncActivityHandleto allow custom contextTODO: test thiswith_contextmethods on any of the following components:with_contexton anEncodingPayloadConverter. These are used by the SDK’s default payload converter.with_contextmethod onAsyncActivityHandlethat they can call their own activity context.with_contexton a payload converterSuppose a user implements
We will now use their
with_context()when constructing the payload converter that we actually use.with_contexton anEncodingPayloadConverterHowever, to customize a payload converter, a user normally creates a subclass of
CompositePayloadConverterthat prepends a customEncodingPayloadConverteronto the existing collection. We will make it possible for users to implement context-awareto_payloadandfrom_payloadmethods on theirEncodingPayloadConverter.For this, the SDK must implement
with_contextonCompositePayloadConvertersuch that:with_contexton any of the user’sEncodingPayloadConvertersCompositePayloadConverterNote
Introduce serialization context (workflow/activity) and propagate it across payload/failure conversion and codecs throughout client and workers; add AsyncActivityHandle.with_context and extensive tests.
SerializationContext,WorkflowSerializationContext,ActivitySerializationContext, andWithSerializationContext.DataConverter.with_context,CompositePayloadConverter.with_context, and context propagation to payload/failure converters and codecs.CommandAwarePayloadVisitorand contextvar-based command tracking._CommandAwarePayloadCodec.WorkflowExecution.data_converter(contextual) and refactor related constructors.AsyncActivityHandle.with_contextand support per-call data converter overrides for async activity APIs.workflow.payload_converter()andactivity.payload_converter()now return converters with context set.clippyin latest-deps job.Written by Cursor Bugbot for commit c78016e. This will update automatically on new commits. Configure here.